Clinical biostatistics 


XLI. Hard science, soft data, and the challenges of choosing clinical 


variables in research 


Alvan R. Feinstein, M.D.* New Haven, Conn. 
Departments of Medicine and Epidemiology of the Yale University School of Medicine 


In the exchange that occurs when clinicians 
and statisticians collaborate to conduct a re- 
search project, the statistician is regularly asked 
for a great deal of mathematical activity. If the 
project is being planned prolectively, one of his 
assignments is to determine ‘‘sample size’’ by 
estimating how many patients will be needed 
for the research to attain stochastic sig- 
nificance.'? If the project is a clinical trial, 
the statistician is asked to prepare a suitable 
arrangement and schedule for the randomized 
allocation of the compared treatments. After the 
research data have been collected (or if he is 
first consulted after the data are already avail- 
able), the statistician is customarily given the 
job of performing a suitable ‘‘statistical 
analysis’’. For this job, he usually develops 
ways of sorting the data to display the results in 
various tables or graphs, and he then adminis- 
ters the statistical ritual used for testing the 
‘significance’ of what has been found. 

During these mathematical procedures, the 
investigating clinician has not been idle. He is 
usually responsible for creating the basic ideas 
to be tested in the research, for assembling any 
additional clinicians who participated coopera- 
tively, for recruiting the patients who volun- 
teered themselves for observation, for provid- 
ing or arranging to provide suitable care for 
those patients, and for recording the informa- 
tion that describes what happened to them. At 


Supported by Public Health Service Grant No. HS00408 from the 
National Center for Health Services Research and Development. 


*Professor of Medicine and Epidemiology, Yale University School 
of Medicine, New Haven, Conn.; Senior Biostatistician, Coopera- 
tive Studies Program Support Center, Veterans Administration Hos- 
pital, West Haven, Conn. 


certain key moments when the results are 
evaluated, the clinician is responsible for help- 
ing decide whether a ‘‘significant’’ finding is 
substantively rather than merely stochastically 
significant. 1? 

Although this division of research territory 
seems well adapted for the different talents, 
backgrounds, and interests of the clinical and 
statistical collaborators, a crucial scientific zone 
of the territory has not yet been discussed. This 
zone deals with the strategy used for choosing 
the clinical variables that will be observed dur- 
ing the research, and with the tactics that con- 
vert the clinical observations into analyzable 
data. The unsatisfactory management of these 
strategies and tactics is responsible for most of 
the inadequately designed research projects that 
have created so much discontent, dissension, 
and controversy in clinical epidemiologic inves- 
tigation today. 

Important clinical variables may sometimes 
be deliberately omitted from the research data 
because the clinico-statistical collaborators 
have decided that the data are ‘‘soft’’ and un- 
worthy of attention. On other occasions, the 
neglect is inadvertent, because the clinician 
does not specifically mention the importance of 
the clinical variables, or because the statistician 
who is given the assignment of choosing vari- 
ables is not made aware of what to look for. 
Alternatively, information about the important 
clinical data may be collected in raw form, but 
may then be mismanaged in the way the col- 
laborators allow it to be classified, coded, and 
analyzed. Regardless of whether the errors arise 
from omission or commission, the cause of the 
difficulty is a folie a deux, in which capable 


485 


SUI SUOMILUOD ANVI dIqvo{dde oy) Áq poUIOAos am SAPNE YO ‘sn Jo SAMI 107 ÁIG IUUQ £2]1A4 UO (sUORIPUOD-pUL-sULIOY/WIOD:Ko[IM’ ArequoUtUo//:sdny) sUONIPUOD PUR SULAJ IP 29g “[¢ZOZ/CO/OT] UO Kreg] IUUQ KOTIAA * OXON IUVIYION - PIOUDDSRTY ZUQUY VSOY PUY Áq SEFFZZLLGIId>/ZOOI-OI/MOP/UOS KopM-ArerQuourjuordose//:sdny Woy PopeopUMod “F*LL6I “SESOTESI 


486 Feinstein 


clinicians and capable statisticians—working 
diligently, protractedly, and often harmonious- 
ly—arrive at a carefully planned, meticulously 
executed, extensively publicized, and avidly 
defended clinical research project whose value 
is destroyed by its defects in clinical data. My 
purpose in this essay is to discuss the 
pathogenesis, prophylaxis, and therapy of these 
problems in choosing and analyzing clinical 
variables. 


1. Definition of clinical variables 


The first issue to be considered is the distinc- 
tion between clinical variables and all of the 
other variables used to express the data that de- 
scribe medical events in people. Demographic 
variables refer to such intrinsically personal fea- 
tures as age, gender, race, occupation, marital 
status, or religion. Paraclinical variables refer 
to the information obtained, via technologic 
procedures, in roentgenograms, in histologic or 
cytologic examinations, and in laboratory re- 
sults for chemical, microbiologic, and electro- 
graphic data. Therapeutic variables contain de- 
scriptions of the dosage, duration, or other 
characteristics of treatment. 

Data for the types of variables just cited can 
be collected by someone with relatively little 
medical training and sophistication. A good 
secretary (or an inanimate questionnaire or 
computer terminal) can conduct the interview 
for obtaining demographic details. A capable 
laboratory or radiographic technician can often 
supply the paraclinical data. An operating room 
attendant or a pharmacist can describe surgical 
procedures or pharmaceutical regimens. 

The distinguishing characteristic of clinical 
data is that they usually require sophisticated 
clinical knowledge to be acquired, interpreted, 
and appreciated. Among such data are: symp- 
tomatic variables, which express the existence 
and severity of the patient’s subjective distres- 
ses, discomforts, or other symptoms; co-morbid 
variables, which denote the concomitant occur- 
rence of additional diseases, beyond the 
“‘main’’ disease under study; chronometric var- 
iables, which cite the duration of different clini- 
cal (or sometimes paraclinical) manifestations; 
and decisional variables, which list the reasons 
why the patient (or physician) preferred one 
course of action rather than another. 


Clinical Pharmacology 
and Therapeutics 


2. Collection of the ‘basic’ data 


In ordinary medical circumstances, a physi- 
cian notices all these clinical attributes of a pa- 
tient and may even write notes about many or 
all of them in the patient’s medical record. For 
most research projects, however, the patient’s 
original medical record is not the official docu- 
ment that becomes analyzed. Instead, an ex- 
cerpt of the patient’s data is entered into a spe- 
cial format that is often called a case report 
form. The choice of what is collected on this 
form will determine the data that become the 
basic information noted in the research. Fur- 
thermore, the information noted in the case re- 
port form does not constitute the basic data that 
become analyzed. For most modern analyses, 
the collected case report data are converted, via 
various systems of classification and coding, 
into the entries that are punched on Hollerith 
(IBM) cards or affixed onto magnetic tape for 
processing with a digital computer. 

For all these reasons, the so-called basic ana- 
lytic data that are available on the computer 
cards and tape(s) of a research project may be 
quite far removed from the basic events that 
transpired during the project and from the fun- 
damental accounts of those events. Before 
reaching the punched cards or tapes, the data 
underwent at least five major transfer points at 
which the original events were selectively ob- 
served and either deliberately or inadvertently 
“‘edited’’. These transfer points are: (1) the pa- 
tient’s basic observations and reports of subjec- 
tively perceived phenomena; (2) the physician’s 
observation of reports from the patient and from 
other sources; (3) the physician’s recording and 
interpretation of these reports in the original 
medical record; (4) the transfer of data from the 
medical record to the corresponding sites in the 
case report form; and (5) the numerical coding, 
for computer storage, of what is in the case 
report form. (A sixth transfer, which is quite 
important but which will not be further dis- 
cussed here, occurs when a key-puncher does 
the work that converts the coded data into the 
punched cards or magnetized tapes transmitted 
to the computer.) 

During all these acts of transference, some- 
one must decide which data are worth preserv- 
ing, so that the information is coded, stored, 
and available for retrieval and analysis. The 


SUI SUOMI aATRAID dIqvortdde oy) Áq poUIOAog am SAPNE YO ‘sn Jo SAMI 107 AxeAQr'] IUUQ £2]1A4, UO (sUORIPUOD-pUL-sULIOY/WIOD CoM’ AresquoUtUo//:sdny) SUONIPUOD PUL SULAJ IP 29g “[¢ZOZ/CO/OT] UO Kreg] OUTED KOTIAA * OXI IUVIYION - PIDUDDSRTY ZLIQUIY VSOY PUY Áq SEFFZTLLGIId2/ZOOI-OIMOP/UOD KopM:Arvrquourjuordose//:sdny Woy PopeopUMod “F*LL6I “SESOTESI 


Volume 22 
Number 4 


person who makes those coding decisions will 
obviously want to collect what is important. If 
that person decides that certain data must be 
obtained for coding, the decision will percolate 
back from the coding form to the case report 
form, from the case report form to the physi- 
cian, and from physician to the doctor-patient 
interchange. But if crucial data are omitted 
from the coding procedure that prepares the 
information for processing, the variables will 
be absent from the ‘‘basic data’’ used in 
the analysis. The patient may describe his 
story well; the doctor may keep an excellent 
medical record; the case report form may be 
thoughtfully prepared and conscientiously 
submitted—but if the clinical variables are not 
suitably coded, they become excluded from the 
“basic data’’. 

The decision about which variables are im- 
portant enough to be coded and stored is there- 
fore fundamental to the success of a research 
project. How is this decision made? 


3. Statistical decisions about ‘importance’ 


From a purely statistical viewpoint, several 
methods are available for deciding whether a 
variable is important. One approach, based on 
the standard deviation and coefficient of varia- 
tion for the data of a single variable, is to attri- 
bute minimal importance to data in which min- 
imal variation has occurred. For example, if 
everyone in a particular study is either 62, 63, 
or 64 inches tall, the coefficient of variation for 
height would be quite small; and height would 
probably not serve as an important dis- 
criminator among the patients. A second ap- 
proach, based on statistical calculations of cor- 
relation coefficients for pairs of variables, is to 
assume that if two variables are highly corre- 
lated, one of them is probably unimportant and 
can be eliminated. For example, if we found a 
correlation coefficient of 0.99 between the vari- 
ables color of shoes and color of shoelaces, we 
might conclude that color of shoelaces (or color 
of shoes) could be omitted from the subsequent 
analysis. 

A third statistical approach is to use a mul- 
tivariate mathematical process,!? such as factor 
analysis or cluster analysis, that explores the 
inter-relationship of a series of different vari- 
ables. After the data have been suitably mas- 


Clinical biostatistics 487 


saged and rearranged according to certain arbi- 
trary mathematical principles, the process pro- 
duces the allegedly ‘‘important’’ combinations 
of variables, which are called factors or clus- 
ters. 

A fourth statistical approach is to use a dif- 
ferent kind of multivariate mathematical pro- 
cess!” for determining the relationship between 
a set of candidate variables, whose individual 
importance is to be evaluated, and a single 
target variable, whose importance is previously 
accepted. (In mathematical terms, the target 
variable is called dependent and the candidate 
variables, independent.) For this approach, a 
procedure such as multiple linear regression, 
discriminant function analysis, or multivariate 
stratification can be employed to note the simul- 
taneous effect of a large number of baseline 
candidate variables on an outcome target event 
such as death or myocardial infarction. The 
multivariate statistical procedures will yield 
standardized ‘‘partial regression coefficients”’ 
or other numbers that indicate each variable’s 
relative importance. With the aid of computers, 
the statistical procedure is often conducted in a 
“‘stepwise’’ manner that permits each variable 
to be evaluated at each step in the operations, 
with a decision made at that step to either in- 
clude or reject the candidate variable in the 
group deemed “‘important.’’ 

Regardless of the inherent merit or mathe- 
matical ingenuity of these four statistical ap- 
proaches, they all suffer from the same funda- 
mental flaw. None of the procedures can be 
applied until after the data have been coded and 
collected. The statistical strategies can be very 
helpful for making decisions about the im- 
portance of what was coded—but the strategies 
cannot be used to determine, in advance, what 
data to observe, what to report, and what to 
code. 

(The failure of official reviewers of the con- 
troversial UGDP study?! to recognize the dif- 
ference between true basic data and coded 
“basic data’’ is probably responsible for the 
continued smoldering of the controversy. The 
reviewing agents believe that they checked the 
basic data, but what they checked was only the 
coded data stored in the computerized tapes. 
Applying multiple linear regression models to 
the computerized data, the statistical analysts 


SUI SUOMILUOD ANVAN dIqvor{dde oy) Áq poUIOAos am SAPNE YO ‘sn Jo SAMI 107 AxeAQr’] IUUQ £2]1A4, UO (sUONIPUOD-pUL-sULIOY/WIOD ÁJ 1 ArequoUtUo//:sdny) SUONIPUOD PUR SULAJ, IP 29g “[¢ZOZ/CO/OZ] UO Á1EIQIT IUUQ KOTIAA * OXI IUVIYION - PIOUDDSR]g ZUQUY SOY PUY Áq SYFFZZLLGIId2/ZOOI-OIMOP/UOS Ko[M-Aserquaurjuordosey//sdyy Woy PopeofUMod “p *LL6I “SESOTESI 


488 Feinstein 


claim to have ruled out any baseline disparities 
among the treated groups. Unfortunately, how- 
ever, at least sixteen major clinical variables 
were omitted! from the coded baseline infor- 
mation that was stored in the computer. The 
subsequent mathematical analyses of the coded 
data may have been sublime in conception and 
magnificent in execution, but with so many im- 
portant variables omitted from consideration, 
the results have no credibility.) 

Since mathematical modes of analysis can be 
applied only to data that have been coded, 
Statistical strategies obviously cannot be used 
either to detect important data that were not 
coded or to demonstrate whether the coding 
process was properly performed. The only 
method by which the clinico-statistical col- 
laborators can find, prevent, and remedy these 
errors is by using good judgment. By reviewing 
previous clinical experience, evaluating pub- 
lished literature, and consulting knowledgeable 
clinicians, the collaborators can learn what in- 
formation is worth collecting. 


4. Clinical judgments about ‘importance’ 


Few statisticians are comfortable about hav- 
ing to make major decisions on the basis of 
‘‘clinical judgment’’.® The judgment may not 
be expressed in clear, specific terms and may 
not be accompanied by any direct supporting 
evidence. The evidence, when offered, may be 
anecdotal rather than documentary, vague 
rather than precise, and qualitative rather than 
quantitative. The clinician may be unable to cite 
the characteristics of the group of patients from 
whom the judgment was derived, or the cited 
characteristics may clearly indicate that the pa- 
tients were nonrepresentative. If the judgment 
deals with a decision about the importance of a 
variable, the clinician may not stipulate either 
the particular target affected by the variable or 
the intellectual, scientific, or other mechanism 
by which the importance was discerned. Thus, 
the clinician may say that ‘‘return of appetite’ 
is a ‘‘good prognostic sign’’, without listing the 
particular entity—such as survival, relief of 
pain, or return to gainful employment—whose 
outcome is being predicted; without quantifying 
the magnitude or likelihood of ‘‘good’’; and 
without defining exactly what is meant by ‘‘re- 
turn of appetite’’. 


Clinical Pharmacology 
and Therapeutics 


Nevertheless, despite these imprecisions, the 
clinicians’ judgments are often accurate, valid, 
and compelling. When the judgments are ac- 
tually solicited, expressed, coded, and con- 
verted into analyzable data, the statistician may 
be shocked by their cogency. For example, al- 
though oncologists receiving statistical guid- 
ance have spent several decades using a 
cancer’s cellular type as the main variable for 
predicing patients’ outcomes, thoughtful clini- 
cians have regularly known that the patient’s 
functional condition was of prime importance in 
prognostic judgment. When clinicians were 
asked to employ prognostic judgment rather 
than paraclinical histology in forecasting the 
outcome of treatment for cancer, a statistician 
used the results to inform his colleagues that 
‘Doctors ain’t so dumb’’.3! When the judg- 
ments were allowed deliberate expression, even 
in so crude a variable as ‘‘performance status’’, 
a consortium of prominent statisticians?’ ex- 
pressed astounded surprise that the variable was 
prognostically ‘‘more important than histologic 
type, disease extension, or any of the usual in- 
formation!’’. 

A clinician’s inability to provide documen- 
tary evidence for a judgmental decision is, of 
course, irrelevant to the question of whether or 
not to code and analyze the variables suggested 
by the judgment. If the variables are coded, the 
results can be analyzed, documentary evidence 
will be obtained, and the accuracy of the judg- 
ment will be sustained or refuted in the col- 
lected data. If the variables are not examined, 
clinicians and statisticians will continue to be 
deprived of the valuable information that might 
be provided. Since almost all thoughtful clini- 
cians will readily acknowledge the importance 
of clinical variables, the current fashion of 
omitting such variables from analysis cannot be 
attributed to a non-recognition of their value. 
The fashion arises not from ignorant neglect of 
the data’s importance, but from entrenched 
ideology about the data’s quality. The clinical 
variables are rejected or shunned because they 
are regarded as ‘‘soft’’ and unreliable, and 
therefore unworthy of unscientific attention. 


5. The creed of ‘hard’ and ‘soft’ data 


The words hard and soft are constantly used 
during discussions of statistical data, but are 


SUIT SUOMILUOD ANVI dIquor{dde oy) Áq poUIOAos am SAPNE YO ‘sn Jo SAMI 107 AxeAQr'] IUUQ £2]1A4 UO (sUONIPUOD-pUL-SULIOY/WIOD ÁJ IW AreaquoUtUo//:sdny) SUONIPUOD PUR SULAJ IP 29g “[¢ZOZ/CO/OZ] UO Kreg] IUUQ ÁLMA * OXI IUVIYION - PIOUDDSRIg ZUQUY VSOY PUY Áq SYFFZZLLGIIA2/ZOOI-OI/MOP/UOS Ko[M-Aseruourjuordosey//:sdny Woy PopeopUMod “p *LL6I “SESOTESI 


Volume 22 
Number 4 


seldom specifically defined. The words do not 
appear in A Dictionary of Statistical Terms*4 
and are not listed in the index of about 60 statis- 
tical biostatistical textbooks in which I have just 
conducted an ad hoc search. Despite the ab- 
sence of a definition, the idea of hard data 
exerts an enormous statistical and scientific ap- 
peal: it is what ‘‘good’’ research should con- 
tain. 

At least two different concepts of data, how- 
ever, are included under the label of hard. One 
of these concepts refers to the general form of 
architecture used in the research structure with 
which the data were collected. Thus, data from 
a randomized, controlled, experimental clinical 
trial are regarded as harder than data from a 
non-randomized, non-experimental survey. 
Data from a ‘‘case control’’ (trohoc) survey, 
where the epidemiologist obtained the informa- 
tion directly from each person, are regarded as 
harder than survey data from comparisons of 
mortality rates in different geographic locali- 
ties, where all the information is indirect, with 
the denominators of the rates coming from cen- 
sus bureaus, the numerators coming from death 
certificates, and the epidemiologist acting as a 
scorekeeper. 

In most discussions of data, however, and in 
the rest of this essay, the idea of hardness refers 
to the interior decoration of the research struc- 
ture, not to its architecture. In this alternative 
concept, the hardness of the data depends on the 
quality and reliability of the fundamental, raw 
elements of information. 

One attribute that helps make data hard is that 
the information be acquired objectively rather 
than subjectively. Thus, a doctor’s observation 
of whether the patient has tenderness in a knee 
is regarded as objective and is therefore harder 
than the patient’s observation and report of the 
pain experienced in the knee when he walks. 
Another common component of hardness is that 
the observed entity should be preservable, so 
that it can be re-observed and checked. Thus, a 
roentgenogram of the knee (or a specimen of 
blood, urine, or tissue, or an electrocardiogram) 
can be saved for re-examination and is therefore 
harder than the patient’s symptom, or the doc- 
tor‘s palpatory sensation, whose occurrence is 
not preservable. A third desirable characteristic 
of hardness is measurement on a dimensional 


Clinical biostatistics 489 


rather than ordinal or nominal scale. A 
goniometric measurement of the knee is thus 
harder than a description of its pain, tenderness, 
or roentgenographic appearance. Height, 
weight, serum cholesterol, and width of Q 
waves are all measured dimensionally and are 
therefore harder than briskness of reflexes, 
urgency of sickness, and severity of dyspnea, 
which are expressed in ordinal scales (such as 
0,1+,2+,.. . or none, mild, moderate, 
severe); or histologic type of cancer, which is 
expressed in a nominal scale (such as anaplas- 
tic, epidermoid, etc.). 

These attributes of objectivity, preservabili- 
ty, and dimensionality occur in the specimens 
examined in modern clinical laboratories and 
allow application of the quality control proce- 
dures that make the measurements reliable and 
hard, but none of these three attributes is neces- 
sary for hardness. The crucial attribute for 
hardness is reliability, which can be attained 
even when the observation is subjective, non- 
preservable, and non-dimensional. For exam- 
ple, such data as death, Caucasian, anaplastic 
cancer, and occlusion of left main coronary ar- 
tery are usually regarded as hard, although sub- 
jectively observed and non-dimensionally cited. 
A palpatory sensation, such as a ballotable 
patella or a hard lymph node, is nonpreservable 
as well as subjective and non-dimensional, but 
the data might be accepted as hard if a quintet 
(or trio) of suitable clinical experts, examining 
independently, all agreed on what was reported. 

The need for reliable data is obviously a basic 
essential of science and the quest for such data 
is obviously laudable and desirable. Unfortu- 
nately, however, the contemporary pursuit of 
hard data has gone beyond the boundaries of 
necessity and desirability. In the current fash- 
ions of clinico-statistical research, the idea of 
hard data has often become a creed, rather than 
a goal. As a result of the creed, hard data have 
been excessively venerated to an extent far ex- 
ceeding their inherent importance or actual reli- 
ability, and soft data have been not merely 
de-emphasized, but deliberately excluded or 
eliminated from consideration. For the clini- 
costatistical worshippers of this creed, soft data 
are not just ‘‘dirty’’ and sinful; they are sca- 
brous horrors, to be expunged from civilized 
numeracy. 


SUII SUOMILUOD aATRAID dIqvor{dde oy) Áq poUIOAos am SAPNE YO ‘sn Jo SAMI 107 AxeAQr'] IUUQ ÁJ UO (sUONIPUOD-pUL-sULIOY/WIOD Ko[M’ AreAquoUTUo//:sdny) SUONIPUOD PUR SULAJ, IP 29g “[¢ZOZ/CO/OT] UO Á1EIQIT UUO KOTIAA * OXI IUVIYION - PIDUDDSRTg ZUQUY SOY PUY Áq SEFFZZLLGIId2/ZOOI-OI/MOP/UOD Ko[M:Areruourjuordose//:sdny Woy PopeofUMod “p *LL6I “SESOTESI 


490 Feinstein 


Despite the clear, uncontrovertible virtues of 
hard data, the state of clinical science has been 
deteriorated rather than improved by the associ- 
ated creed, because it produces some major pit- 
falls that are seldom recognized or suitably 
evaluated for their ‘‘adverse side effects’’. 


6. The pitfalls of the ‘hard-data’ creed 


The devout reverence for hard data is associ- 
ated with at least four major deleterious conse- 
quences: the credulous acceptance of ‘“‘hard’’ 
information that is unreliable; the substitution 
of irrelevant or distorted hard data for important 
soft data; the inflation of effort and costs of 
clinical trials for which unnecessarily huge 
sample sizes have been calculated; and the 
production of dehumanized science in patient 
care. 

a. The frequent unreliability of hard data. 
Although most clinical chemists know that lab- 
oratory data must be constantly monitored and 
checked for quality control, clinicians have 
commonly accepted the dimensional values of 
paraclinical data as ‘‘hard’’ merely because 
they are dimensional. Every laboratory proce- 
dure can produce a wrong result from time to 
time, and for some procedures (or some labora- 
tories), the results may be wrong more often 
than they are right. In a cooperative clinical 
trial, the opportunity for errors or inconsisten- 
cies in paraclinical data is much greater than for 
research conducted at a single institution, be- 
cause so many different laboratories are in- 
volved. Yet almost no attention may be given to 
determining whether or not the hard data col- 
lected in the trail are actually reliable. 

When the attention finally occurs, the inves- 
tigators may be stunned by the results. For 
example, after the famous UGDP study had 
been in progress for six years the coordinating 
center in 1966 issued the distressed announce- 
ment?” that ‘‘periodic shipments of coded stan- 
dards to each of the participating laborato- 
ries has revealed rather marked variations 
among the clinics in the quality and repro- 
ducibility of the creatinine determinations’’. In 
the same announcement, the urinary protein 
measurements were noted to be even more unre- 
liable than the creatinine determinations. These 
problems made the biostatistical coordinators of 


Clinical Pharmacology 
and Therapeutics 


the trial express doubts that the results would 
ever ‘‘be useful in detecting differences among 
the treatment groups’’. (Either the unreliable 
Measurements were somehow corrected retro- 
spectively or else the original doubts were sub- 
sequently exorcized, because the UGDP statis- 
ticians eventually proceeded”® to use the 
creatinine and urinary protein data for major 
analytic comparisons). 

The problems of inter-institutional variation 
and unreliability in paraclinical data are now 
often prevented in a cooperative study by ar- 
ranging for all pertinent specimens to be sent to 
a single ‘‘central laboratory’? where the test 
procedure is carefully standardized and checked 
for accuracy. The central-laboratory strategy 
takes care of the problems in a pre-planned 
cooperative study, but it does not deal with the 
many problems of interpreting laboratory data 
that are ‘‘pooled’’ from diverse, non-‘‘cen- 
tralized’’ sources in a trial or survey. 

Furthermore, the central-laboratory strategy 
is seldom used, although particularly necessary, 
in the many situations where the ‘‘apparatus’’ 
that provides the data is a physician rather than 
an inanimate machine. Statisticians and clini- 
cians have been extraordinarily naive in unques- 
tioningly accepting, as hard data, the reports 
that are generated by pathologists and 
radiologists. For example, when the observer 
variability of pathologists has been checked for 
diagnosing the histopathology of cell types in 
cancer?’ 5, 10, 20, 22, 25, 30, 32, 33, 35 the results 
usually show so many discrepancies and incon- 
sistencies that the histopathologic distinctions 
may become impossible to interpret. Undaunted 
by this gross unreliability, clinicians continue to 
plan treatment according to cellular types; and 
statisticians continue to accept, code, and 
analyze the data. 

Radiologists have similar problems in consis- 
tency. Whenever radiologists have been studied 
for interobserver and intraobserver agreement 
in the interpretation of films for pulmonary le- 
sions?! 3% 40, for heart size!8, for coronary arte- 
riography® ™ 42, or for other key data*4, the var- 
iability has been striking. Yet roentgenographic 
data are often calmly and credulously accepted 
as hard and reliable. In some clinical trials, 
where eligibility for the study depended on the 


SUIT SUOMI ANVI dIqvor{dde oy) Áq poUIOAog am SAPNE YO ‘sn Jo SAMI 107 ÁWIQIT IUUQ £2]1A4, UO (sUORIPUOD-pUL-SULIOY/WIOD:Ko[IM’ AreAquoUTUo//:sdny) SUONIPUOJ PUL SULI IP 29g “[¢ZOZ/CO/OT] UO Á1EIQIT UUO KOTIAA * OXI IUVIYION - VOUSPLA ZUQUY VSOY PUY Áq SSHHTZLL6NAI/ZOOT 01/10P/W00 Kopm Kreg ouu o dose//:SdNy Woy PIpeoumod “p *LL6I “SESOTESI 


Volume 22 
Number 4 


interpretation of roentgenographic findings, pa- 
tients who were randomized have later had 
to be de-admitted from the study when the 
roentgenographic evidence was reviewed. 

Epidemiologists are even more credulous 
than clinicians in accepting and analyzing 
grossly unreliable data. Clinicians who are 
familiar with the sources of information at their 
own institution may have good grounds for 
being confident about quality of data produced 
by the radiologist, pathologist, or clinical lab- 
oratory. No scientific investigator, however, 
has any grounds for confidence about the diag- 
noses recorded as the ‘‘cause of death’’ in a 
collection of death certificates.? Nevertheless, 
with complacent imperturbability, epidemi- 
ologists carry out extensive analyses for data 
based on the causes of death cited in com- 
pendia of death certificates accumulated in dif- 
ferent countries and eras. The only ‘‘scientific 
standard’’ that epidemiologists use for ‘‘qual- 
ity’’ in death certificate data is a quantitative 
requirement. If a non-specific cause of death 
(such as old age) is recorded in more than 15% 
of the death certificates, the data are regarded as 
unreliable’. 

Finally, even when the raw data are correct 
and presumably reliable, they are sometimes 
not properly employed. For example, in the 
UGDP trial®*, 69 patients were enrolled in the 
study despite failure to fulfill a simple quantita- 
tive criterion of glucose intolerance. Even after 
the original error was discovered, the patients 
were maintained in the study, receiving such 
unnecessary treatments as an invariant daily in- 
jection of insulin at fixed dosage. 

Just as all that glistens is not gold, all data 
that are ‘‘hard’’ are not necessarily reliable. 

b. The replacement of meaningful soft data 
by irrelevant or distorted hard data. As a 
result of hard-data ideology, the ‘‘substitution 
game’’ described by Yerushalmy*! has become 
more than just a popular sport in large-scale 
clinical trials. It has become almost a normative 
standard in design. 

The game begins when the clinical inves- 
tigator suggests getting information about an 
important, soft-data variable. “‘No’’, says the 
consultant, ‘‘we must not use anything that is so 
soft. Find something that is harder’’. The col- 


Clinical biostatistics 491 


laborators then agree upon a substitute that is 
hard, but irrelevant or distorting of what the 
investigator wanted to know. For example, in 
almost all the trials!® in which anticoagulants 
were used to prevent thromboembolism in pa- 
tients with myocardial infarction, the statistical 
data never contained an assessment of whether 
or not the patients developed thromboem- 
bolism; the only end point under study was 
death. In all of the trials in which chemotherapy 
has been used to provide ‘‘palliation’’ for pa- 
tients with cancer, the occurrence of palliation 
is almost never statistically reported for such 
‘‘soft’? phenomena as pain, anorexia, ability to 
work, and other aspects of quality of life. The 
“‘nalliative’’ accomplishments of oncologic 
treatment are usually assessed only with such 
hard data as survival time and size of tumor. 
In the celebrated UGDP investigation of vas- 
cular complications in diabetics, vascular com- 
plications were never described in a manner that 
would be recognizable to most clinicians!*. The 
main end point variable was death; and ‘‘vascu- 
lar complications’? were defined statistically, 
rather than clinically, according to a medically 
bizarre set of calculations and quantifications. 
Diabetic neuropathy in the legs, for example, 
was measured according to the dimensional re- 
sult of a biothesiometric test of vibration in the 
flesh pulp of an index finger. Congestive heart 
failure was identified only according to the pa- 
tient’s report of use of digitalis. Antecedent 
myocardial infarction, as a baseline entity be- 
fore treatment, was never identified. Instead, 
patients were cited as having dimensionally 
specified electrocardiographic abnormalities. 
Because the hard-data creed has dominated 
the planning of massive, federally funded clini- 
cal trials, the trials have often been distracted 
from their original objectives, and have often 
failed to clarify the main clinical issues for 
which the research was allegedly intended. In 
addition to the abundant examples provided by 
the celebrated UGDP study, another set of il- 
lustrations comes from the Coronary Drug Proj- 
ect*, which is the largest and costliest clinical 
trial that has ever hitherto been reported. The 
main clinical purpose of the trial was to deter- 
mine whether patients with myocardial infarc- 
tion are benefited by having their serum lipids 


SUI SUOMILUOD ANVI dIqvoy{dde oy) Áq poUIOAog ar SAPNE YO ‘sn Jo SAMI 107 AxeAQr'] IUUQ £2]1A4 UO (sUORIPUOD-pUL-sULIOY/WOD:Ko[IM ArequoUTUo//:sdny) SUONIPUOD PUR SULAJ, IP 29g “[¢ZOZ/CO/OT] UO Kreg] UUO KOTIAA * OXI IUVIYION - PIOUDDSRIY ZLIQUIY VSOY PUY Áq SEFFZZLLGIId?/ZOOI-OI/MOP/UOS Ko[MAreruourjuordosey//sdny Woy popeopUMod “p *LL6I “SESOTESI 


492 Feinstein 


lowered. The only ‘‘benefit’’ assessed in the 
trial, however, was prevention of death, which 
turned out to have essentially similar rates in 
most of the treated and untreated patients. The 
investigators have never reported whether pa- 
tients were benefited in terms of relief of symp- 
toms, ability to work, and other aspects of qual- 
ity of life. Furthermore, the death rates for the 
CDP groups have been presented only in terms 
of the treatment regimen to which the patients 
were assigned. The death rates have never been 
reported according to groups of patients in 
whom serum lipids were or were not actually 
lowered. 

As John Tukey*¢ has pointed out: ‘‘Far better 
an approximate answer to the right question 

. . than an exact answer to the wrong ques- 
tion’’. 

c. The inflation of sample sizes. The de- 
mand for a hard-data end point alters many 
therapeutic trials from a remedial to a 
prophylactic goal. Thus, if the end point is re- 
lief of symptoms, the trial is remedial; if the end 
point is the development of a vascular compli- 
cation or death, the trial is prophylactic, since 
success is measured according to how well the 
target event has been prevented. Because the 
occurrence rate is usually much lower for the 
appearance of a non-existing entity than for the 
change of something that already exists, the 
sample sizes needed for stochastic significance 
in a trial become greatly inflated if the end point 
is converted from a remedial soft-data event to a 
prophylactic hard-data event; or if a soft-data 
prophylactic event, with a relatively high rate of 
occurrence, is altered to an infrequently occur- 
ring hard-data event. 

For example, suppose sample size is being 
calculated for a clinical trial intended to test a 
“‘control’? vs. an ‘‘active’’ therapy for the 
treatment of stable angina pectoris. Suppose 
further that we will regard ‘‘success’’ as an out- 
come in which the relative results for one treat- 
ment are 30% better than for the other; that we 
want the æ level of “‘significance’’ to be .05 and 
the B level to be .10; and that we have the 
choice of a remedial soft-data end point, such as 
improvement of symptoms, or a prophylactic 
hard-data end point, such as death. 

For the soft-data end point, let us assume that 
70% of the control group will improve. The rate 


Clinical Pharmacology 
and Therapeutics 


of improvement required for a relative increase 
of 30% in the treated group will then be 91%. 
We are now ready to calculate the sample size 
of each group, according to the classical for- 
mula! 


n= + za[p, (I-p,) + p, (l-p,) BY. 


Using 2-sided levels for œ and 8 we find (from 
conventional statistical tables) that z, = 1.96 
for a = .05, and z = 1.64 for B = .10. 
According to our assumptions, pı = .70, 
Pe = .91, and A = p — pı =.21. Since 
p = (pı + p2)/2, p = .805. We now insert 
these values into the formula, grind out the cal- 
culations, and find that n = 89 for each treated 
group. The total sample size required for the 
two groups would be 178 patients. 

For the hard-data end point, let us assume 
that the expected fatality rate in the control 
group is 10%. For a 30% reduction, the rate of 
death in the actively treated group should 
be 7%. With these assumptions, p, = .10, 
P2 = .07, A = —.03, and p = .085. Using the 
same values of z, = 1.96 and Z= 1.64, we 
enter the formula for n, do the calculations, and 
find that n = 2237. The total sample size re- 
quired would be 4474. Thus, by using a hard- 
data end point, we have made the sample size 
25 times larger (= 4474 + 178) than would 
have been needed with the soft-data end point. 

A believer in the clinico-statistical creed 
would now respond by saying, ‘‘Yes, the 
hard-data end point creates a massive increase 
in the number of patients, costs, and effort 
needed for the trial. But aren’t we better off 
spending the extra money and work to get an 
answer that is reliable, rather than an answer 
that will be equivocal or controversial because 
it depends on soft data?’’ The answer to this 
question is to point out, first of all, that the 
question—like many issues stated in ideological 
terms—is oversimplified. It does not take into 
account the basic purpose of the trial. Was the 
trial intended to measure improvement of pa- 
tient’s status or prolongation of life? If prolon- 
gation of life was the main purpose, then the 
hard-data end point is obviously necessary. If 
the main purpose of treatment, however, was to 
improve the status of patients, then the use of a 
hard-data end point has displaced the goal of the 


SUI SUOMILUOD aATRAID dIqvoy{dde oy) Áq poUIOAos am SAPNE YO ‘sn Jo SAMI 107 KxeAQr’] IUUQ £2]1A4 UO (sUONIPUOD-pUL-sULIOY/WIOD:Ko[M’ ArequoUTUo//:sdny) sUONIPUOD PUR SULAJ IP 29g “[¢ZOZ/CO/OT] UO ÁTEIQIT UUO ÁLMA * OXI IUVIYION - PIDUDDSRTY ZUQUY VSOY PUY Áq SYFFTZLLGIId2/ZOOI-OIMOP/UOS K[M:Aseruaurjuordose//:sdyy Woy popeofUMod “p *LL6I “SESOTESI 


Volume 22 
Number 4 


trial, as well as inflated its costs, efforts, and 
sample size. 

Furthermore, the creedal question is based 
on the assumption that the hard-data end point 
is the only possible reliable end point. With 
that assumption, all the extra work and costs 
are expended for getting the enormous numbers 
of patients who must be entered and followed 
in the trial. There is an alternative approach 
to reliability, however. Before the trial begins, 
some extra work and costs can be directed at 
improving the ‘‘hardness’’ and reliability of the 
desired soft-data end point. The investment of 
effort in making a soft-data end point 
reliable—by methods outlined later in this 
essay—will be abundantly compensated by 
smaller sample sizes, by more clinically perti- 
nent results, and by the opportunity to use the 
improved data in future investigations. If soft 
data are constantly rejected merely because they 
are soft, with no efforts made to improve their 
quality, the scientifically “‘vicious cycle’’ is 
perpetuated. One hard-data trial with bombasti- 
cally inflated sample sizes merely leads to an- 
other bombastic hard-data trial. During the 
ever-increasing spiral of efforts and costs, the 
opportunity to improve the quality of clinical 
variables is ignored or lost. 

d. The dehumanization of clinical science. 
Since all of the uniquely human phenomena of 
patients are expressed in soft clinical data, the 
exclusion of such data creates a biostatistical 
clinical science that is deliberately de- 
humanized.'! The results that emerge from the 
trials do not contain information about the 
things that a practicing doctor and a patient 
might want to know in choosing treatment. 
The baseline clinical condition of the treat- 
ed patients—in severity, co-morbidity, and 
chronometry—is not described in the statistical 
results; and the patients’ post-therapeutic clini- 
cal conditions—in comfort, function, and qual- 
ity of life—are also omitted. In reviewing the 
published results, a doctor and patient can find 
out what happened to death rates per treatment, 
but not what clinical kinds of patients were 
treated, how each kind was affected by each 
treatment, and what was the total clinical im- 
pact on the patient or on the patient‘s family. 

The clinical science that emerges is hard and 
reliable, but is unsatisfactory as a guide to clini- 


Clinical biostatistics 493 


cal practice because of its dehumanization. The 
investigators have systematically excluded the 
distinctly human clinical features that distin- 
guish the practice of medicine from an abstract 
exercise in statistics. 


7. Choosing the important clinical 
variables 

Once a decision is made to pay attention to 
clinical data, many variables become evident as 
having major importance. Some of the variables 
cited earlier will be recapitulated here, and a 
few additional ones will be added. 

a. Prognostically important variables. The 
first group of clinical variables to be cited are 
those whose value has already been demon- 
strated in prognostication. 

(1). Types of symptoms. Within the spectrum 
of a single disease, some patients will have 
primary symptoms, some will have seondary 
symptoms (or complications), and some will be 
asymptomatic. The patient’s baseline location 
in this spectrum of symptom types has been 
shown to have major prognostic importance®. 

(2). Severity of symptoms. Prognosis also de- 
pends on the severity of the individual symp- 
toms. For example, a patient with symptoms of 
intractable congestive heart failure is worse off 
prognostically than someone whose cardiac de- 
compensation creates only mild, easily con- 
trolled edema. Someone who has lost only 5% 
of basic body weight is better off than someone 
who has become cachectic. 

(3). Severity of co-morbidity. The diseases 
that co-exist in addition to the main disease 
have already been shown to have dramatic ef- 
fects on the outcome of such ailments as can- 
cer!* 16 and diabetes mellitus?®. 

(4). ‘Performance status’. A patient’s func- 
tional capacity to work or to carry out acts of 
daily living has been shown to be prognostically 
important, but its importance arises mainly be- 
cause ‘‘performance status’’ depends on the 
severity of symptoms and severity of co- 
morbidity. If both of the latter variables have 
been coded, performance status may be a re- 
dundant variable. Furthermore, as noted later, 
performance status may be assessed inaccu- 
rately because it is also affected by ‘‘psy- 
chologic status’’. 

(5). Chronometry. The duration of symptoms 


SUIT SUOMILUOD aATRAID dIqvor{dde oy) Áq powIOAog am SAPNE YO ‘sn Jo SAMI 107 AxeAQr’] IUUQ £2]1A4 UO (sUORIPUOD-pUL-sULIOY/WIOD Ko[IM ArequoUTUo//:sdny) SUONIPUOD PUR SULAJ IP 29g “[¢ZOZ/CO/OZ] UO Kreg] IUUQ KOTIAA * OXI IUVIYION - PIDUDDSRIY ZLIQUIY VSOY PUY Áq SYFFZTLLGIId2/ZOOI-OI/MOP/UOD KopM-AsvrQuaurjuordose//:sdny Woy popeopUMod “p *LL6I “SESOTESI 


494 Feinstein 


and paraclinical manifestations of a disease are 
important for demonstrating how long the dis- 
ease has been present and for estimating the 
disease’s auxometry, or rate of progression. 
Since slow-growing tumors produce symptoms 
slowly, the prognostic importance of chro- 
nometry has now been demonstrated in sev- 
eral cancers.? & 14 

b. Therapeutically important variables. A 
second group of important variables are those 
that involve an assessment of the way a 
therapeutic intervention has been performed, 
and those that deal with the outcome of treat- 
ment (or some other intervention). 

(1). Performance. This variable refers not to 
such misnomers as performance status for a pa- 
tient’s functional capacity, but to the skill with 
which an operative procedure is carried out or 
to the compliance?® with which a patient main- 
tains an oral pharmaceutical regimen. The re- 
sults of technologically difficult surgical opera- 
tions cannot be properly evaluated without suit- 
able appraisal of the skill of the surgical team 
and of the quality of the associated anesthesia 
and recovery-room procedures. The accom- 
plishments of oral pharmaceutical therapy re- 
quire an assessment of how well the patient has 
complied with the prescribed instructions. Both 
of these important variables are regularly omit- 
ted from the data of clinical trials. 

(2). Regulation. When a treatment is carried 
out for the purpose of regulating entity A in 
order to prevent event B, the success of the 
regulation for A is often ignored when the oc- 
currence of event B is reported. For example, 
hypertension, hyperglycemia, and hyper- 
lipidemia are all regulated for the purpose of 
preventing subsequent cardiovascular complica- 
tions. Nevertheless, the occurrence of the com- 
plications is seldom reported in relation to the 
degree of regulation (as excellent, good, poor, 
etc.) for patients receiving hypotensive, hypo- 
glycemic, or hypolipidemic agents in large- 
scale clinical trials. The complications are regu- 
larly reported for groups of patients who were 
assigned a treatment or who complied with the 
assignment, but not according to the regulation 
that was achieved. 

(3). Co-intervention. This variable refers to 
additional treatments or other unplanned inter- 


Clinical Pharmacology 
and Therapeutics 


ventions that a patient received in addition to 
the scheduled main treatment. The co- 
interventions can substantially alter the effects 
of the main treatment and may also, without 
affecting the main treatment, provide clues to 
the existence of major co-morbidity that may 
otherwise be overlooked. Nevertheless, data 
about co-interventions may be omitted, or as- 
sembled in an incomplete manner, or tabulated 
in long ‘‘laundry lists’’6 without any clinically 
effective classifications and analyses. 

(4). Detection. This variable refers to the in- 
tensity of the search and to the diagnostic 
criteria applied for the outcome events that fol- 
low treatment. Unless these processes of detec- 
tion have been carried out equally for the pa- 
tients in the compared treatment groups, the 
rates of the detected outcomes cannot be fairly 
compared. Although ‘‘double-blind’’ tech- 
niques are generally used to eliminate or reduce 
such bias in clinical observations, investigators 
regularly ignore the many other ways in which 
detection bias can occur. An example of this 
problem is demonstrated!* by the unequal rates 
at which necropsy was performed for patients 
who died in the UGDP study and by the ‘‘un- 
blinded’’ way in which the excerpted data about 
dead patients were prepared for a subsequent 
‘“‘blind” review, Another manifestation of the 
problem may be the alleged increase in 
cholelithiasis that has been associated with 
lipid-lowering regimens. Since many gallstones 
are asymptomatic, any event that leads to an 
increase in cholecystography will lead to an in- 
creased detection of silent stones that would 
otherwise be unnoticed. If the lipid-lowering 
regimens also produce gastrointestinal symp- 
toms, unrelated to cholelithiasis, the cholecys- 
tography that is ordered during the ‘‘workup’’ 
of the symptoms may reveal many such si- 
lent stones, which will than fallaciously be at- 
tributed to the treatment rather than to detec- 
tion bias. Nevertheless, the intensity of the 
diagnostic search for gallstones is never cited 
when the occurrence rates of gallstones are 
compared. 

c. Decisional variables. A different but gen- 
erally neglected kind of important clinical in- 
formation can be called decisional data: the 
reasons why the patient or the physician chose 


SUI SUOMILUOD aANRAID dIqvo{dde oy) Áq poUIOAOs am SAPNE YO ‘sn Jo SAMI 107 AxeAQr'] IUUQ £2]1A4 UO (sUONIPUOD-pUL-sULIOY/WIOD ÁJ 1 ArequoUtUo//:sdny) sUONIPUOD PUR SULI, IAP 29g “[¢ZOZ/CO/OT] UO Kreg] OUTED KOTIAA * OXI IUVIYION - PIDUDDSRTY ZLIQUIY VSOY PUY Áq SEFFZTLLGIId?/ZOOI-OI/MOP/UOS Ko[M:Asvruourjuordose//:sdny Woy PopeofUMo( “H *LL6I “SESOTESI 


Volume 22 
Number 4 


one particular course of action rather than an- 
other. By inquiring about the reasons why cer- 
tain decisions were made, the attending physi- 
cian or the investigator can get clues to impor- 
tant clinical variables, or can learn additional 
useful information that might otherwise be 
omitted. 

For example, suppose non-surgical treat- 
ments are being compared in patients classified 
as inoperable for a particular disease, such as 
cancer or coronary artery disease. The spectrum 
of inoperable patients contains at least three dis- 
tinctly different groups: (1) those whose 
anatomic lesion is beyond the boundaries of 
surgical operability; (2) those who have an op- 
erable anatomic lesion, but who are denied 
surgery because of severe co-morbidity; and (3) 
those who have an operable lesion and who are 
also in otherwise excellent health, but who re- 
fuse surgery when it is offered. Since these 
three groups of inoperable patients have distinc- 
tively different prognoses, the usual demands of 
science would call for the groups to receive 
separate analyses of post-therapeutic response. 
In the clinico-statistical creed of the past few 
decades, however, neither the demands of sci- 
ence nor the logic of clinical medicine receives 
suitable attention. The reasons for the decision 
about operability are ignored; the inoperable pa- 
tients are all lumped together as though they 
were homogeneous; and the investigators hope 
that somehow the process of randomization will 
rectify all the flaws. Since randomization can- 
not possibly provide clarity for information that 
was ignored, the flaws become embellished or 
disguised rather than removed. 

Another type of decisional data has been dis- 
cussed elsewhere® under the title of the 
iatrotropic stimulus: the reason that the patient 
chose to seek medical attention at the time he 
did and from the doctor he selected. The infor- 
mation uncovered in response to this inquiry 
can often be helpful in establishing whether or 
not a patient is truly asymptomatic, in evaluat- 
ing the severity of symptoms, and in determin- 
ing which, if any, of several co-morbid diseases 
was responsible for the patient’s main com- 
plaints. 

In studies of the efficacy of diagnostic tests, 
the diagnostropic stimulus, which is the reason 


Clinical biostatistics 495 


why the doctor ordered a particular test, can 
provide valuable data. Such information can be 
especially helpful in determining the dis- 
criminating capacities?’ of the test in screening, 
case-finding, or differential diagnosis; in noting 
whether the test’s results are being used for 
non-diagnostic purposes in reassurance or in 
prognostic and therapeutic decisions; and in as- 
sessing problems of detection bias. 

Of the many other kinds of decisional vari- 
ables, only one more will be noted here: the 
reason for a patient’s functional limitations. 
The lack of suitable attention to this type of 
information is a defect not in statistical creeds, 
but in clinical history-taking. Many clinicians 
believe that a patient’s symptoms cannot be 
accepted as reliable, because they seem to be 
too greatly affected by complex psychologic or 
social phenomena. For many purposes, how- 
ever, the psychologic and social phenomena are 
quite simple, and the source of the unreliability 
is often the history taker, not the history giver. 

Consider a patient with chronic, stable an- 
gina pectoris who says he has not worked for 
eight months. A naive history taker may record 
that the angina keeps the patient from working. 
A knowledgeable history taker will ask why the 
patient has not been working. The question has 
four basic types of answer: (1) the patient has 
engaged in occupational activities, but has 
stopped them because they provoke episodes of 
angina; (2) eight months ago the patient’s doc- 
tor told him not to work, but neither he nor his 
doctor knows whether occupational activities 
would now provoke angina; (3) the patient does 
not get angina at work, but has stopped working 
because he fears it may provoke angina or 
worsen his general cardiac status; and (4) the 
patient has never had angina at work but is un- 
employed because of a general reduction of the 
labor force at his factory, and he has been un- 
able to find a new job. Of these possible reasons 
for not working, only the first is truly attribut- 
able to the pathophysiologic impact of the an- 
gina. The second and third reasons are 
prophylactic, and the fourth is incidental. 

Now consider a patient whose employment is 
of prophylactic origin, and who receives a sur- 
gical bypass graft for treatment of the angina 
pectoris. After the operation, the patient suc- 


SUIT SUOMI IANVAIJ dIqvoy{dde oy) Áq pouIOAog am SAPNE YO ‘sn Jo SAMI 107 ÁIRIGH IUUQ £2]1A4, UO (sUORIPUOD-pUL-sULIOY/WOD ÁJ IW ArequoutUo//:sdny) sUONIPUOD PUR SULAJ I 29g “[¢ZOZ/CO/OZ] UO KrBIqr] IUUQ ÁLMA * OXI IUVIYION - PIOUDDSRTY ZLIQUIY VSOY PUY Áq SYFFZTLLGIId2/ZOOI-OI/MOP/UOS KopM:Aserquourjuordosey//:sdyy Woy Popeo[UMoK “p *LL6I “SESOTESI 


496 Feinstein 


cessfully returns to work, having been en- 
thusiastically urged to do so by the surgeon. 
Had the patient received such enthusiastic urg- 
ing before the operation, he also might have 
successfully returned to work. The recorded 
data may indicate, however, only that he was 
not able to work before surgery and was able to 
do so afterward. An ‘‘improvement’’ inspired 
by enthusiastic iatrotherapy thus becomes at- 
tributed to a surgical bypass graft, because the 
effects of therapy were not assessed, as they 
should have been, only in patients whose physi- 
cal limitations are known to be pathophysio- 
logic. 

This problem in post hoc reasoning with in- 
adequate data cannot be removed by ran- 
domized trials comparing medical vs. surgical 
therapy for angina pectoris. The problem will 
persist as long as the data are inadequate, no 
matter how the treatments are assigned or ‘‘con- 
trolled”. The solution to the problem requires 
no intensive awareness of human psychology 
and no in-depth interviews about childhood, 
parenthood, siblinghood, or any of the other 
-hoods that occupy classical psychoanalysis. 
All that is necessary is a simple, straightforward 
question, asking about the reason why a particu- 
lar limitation has occurred. The answer is 
usually also straightforward; its interpretation 
requires no psychiatric training; and the physi- 
cal limitation can readily be classified as 
pathophysiologic, prophylactic, or co-inci- 
dental. The great barrier to acquiring and 
analyzing this information is not the vicis- 
situdes of the human psyche. The problem is 
that doctors have often not been taught either 
how to take an effective clinical history or how 
to make effective use of the data. 


8. Acquiring the necessary data 


Even if all the foregoing demonstrations and 
arguments are accepted, a devotee of the 
clinico-statistical creed still has a major fall- 
back position. This last refuge is the argument 
that the requisite clinical data cannot be ac- 
quired, because clinicians will either be un- 
cooperative, or, if cooperative, will be too in- 
consistent, imprecise, or unstandardized in the 
way they make their observations and record 
their results. The argument emerges from a pre- 


Clinical Pharmacology 
and Therapeutics 


judice that has many of the characteristics of a 
self-fulfilling prophecy. If the clinico-statistical 
investigators begin with the assumption that 
clinicians will do a poor job of observation, the 
forms for acquiring data are usually designed in 
a way that guarantees a poor job. No space may 
be provided for the necessary data to be re- 
corded; or the available slots to be checked 
may be designed inadequately. (The case-report 
forms of the celebrated UGDP study, as noted 
in an earlier essay*® in this series, provide excel- 
lent examples of this type of problem. Even if 
the UGDP clinicians were prepared to provide 
superb reports of clinical observation, they 
could not do so. The data-acquisition forms 
would not permit it.) 

The challenges of improving the quality of 
clinical data are extensive enough to warrant a 
separate dissertation, which will be reserved for 
a later installment. The challenges will require 
attention to methods of improving (rather than 
merely quantifying) observer variability; strate- 
gies of establishing effective indexes, scales, 
and criteria for classifying clinical phenomena; 
tactics that are applicable for ‘‘measuring’’ 
such entities as chronometry and severity; and 
an appreciation of the importance of evaluating 
transitions as transitions, rather than as two 
separate states of existence. All of these ac- 
tivities are part of a lamentably underdeveloped 
territory that is now in desperate need of cre- 
ative clinico-statistical attention: the domain of 
clinimetrics. 


References 


1. Armstrong, B. K., Mann, J. L., Adelstein, A. 
M., and Eskin, F.: Commodity consumption 
and ischemic heart disease mortality, with spe- 
cial reference to dietary practices, J. Chron. Dis. 
28:455-469, 1975. 

2. Charlson, M. E., and Feinstein, A. R.: The au- 
xometric dimension. A new method for using 
rate of growth in prognostic staging of breast 
cancer, J. A. M. A. 228:180-185, 1974. 

3. Coppleson, L. W., Factor, R. M., Strum, S. B., 
Graff, P. W., and Rappaport, H.: Observer dis- 
agreement in the classification and histology of 
Hodgkin’s disease, J. Natl. Cancer Inst. 
45:731-740, 1970. 

4. Coronary Drug Project. Design, methods, and 
baseline results, Circulation 47:Suppl. 1, 1973. 

5. Correa, P., O’Conor, G. T., Berard, C. W., 
Axtell, L. M., and Myers, M. H.: International 


SUIT SUOMI IANVAIJ dIquor{dde IJ} Áq poUIOAog am SAPNE YO ‘sn JO SAMI 107 ÁWIQIT IUUQ £2]1A4 UO (sUORIPUOD-pUL-sULIOY/WIOD:Ko[IM’ AreaquoutUo//:sdny) SUONIPUOD PUL SULAJ IP 29g “[¢ZOZ/CO/OT] UO Á1EIQIT IUUQ KOTLAA * OXON IUVIYION - VOUISPLA ZUQUY VSOY PUY Áq SSHHTZLLGNAI/ZOOT O1/10P/W00 Kom Kreg ouu o dose//:SdNy Woy PIpPLoumod “p *LL6I “SESOTESI 


Volume 22 
Number 4 


14. 


19. 


comparability and reproducibility in histologic 
subclassification of Hodgkin’s disease, J. Natl. 
Cancer Inst. 50:1429-1435, 1973. 


. DeRouen, T. A., Murray, J. A., and Owen, W.: 


Variability in the analysis of coronary arterio- 
grams, Circulation 55:324-328, 1977. 


. Detre, K. M., Wright, E., Murphy, M. L., and 


Takaro, T.: Observer agreement in evaluating 
coronary angiograms, Circulation 52:979-986, 
1975. 


. Feinstein, A. R.: Clinical judgment, reprinted, 


1974, Huntington, N. J., Robert Krieger Co. 


. Feinstein, A. R.: Clinical epidemiology. II. The 


identification rates of disease, Ann. Intern. 
Med. 69:1037-1061, 1968. 


. Feinstein, A. R., Gelfman, N. A., and Yesner, 


R., with collaboration of Auerbach, O., Hackel, 
D. B., and Pratt, P. C.: Observer variability in 
histopathologic diagnosis of lung cancer, Am. 
Rev. Respir. Dis. 101:671-684, 1970. 


. Feinstein, A. R.: The need for humanised sci- 


ence in evaluating medication, Lancet 2:421- 
423, 1972. 


. Feinstein, A. R.: Clinical biostatistics. XXI. A 


primer of concepts, phrases, and procedures in 
the statistical analysis of multiple variables, 
CLIN. PHARMACOL. THER. 14:462-477, 1973. 


. Feinstein, A. R.: Clinical biostatistics. XXXIV. 


The other side of ‘statistical significance’: 
Alpha, beta, delta, and the calculation of sample 
size, CLIN. PHARMACOL. THER. 18:491-505, 
1975. 

Feinstein, A. R., Schimpff, C. R., and Hull, E. 
W., with the technical assistance of Bidwell, H. 
L.: A reappraisal of staging and therapy for pa- 
tients with cancer of the rectum. I. Development 
of two systems of staging, Arch. Intern. Med. 
135:31441-1453, 1975. 


. Feinstein, A. R.: Clinical biostatistics: XXXV. 


The persistent clinical failures and fallacies of 
the UGDP study, CLIN. PHARMACOL. THER. 
19:78-93, 1976. 


. Feinstein, A. R., Schimpff, C. R., Andrews, 


Jr., J. F., and Wells, C. K.: Cancer of the 
larynx: A new staging system and a re-appraisal 
of prognosis and treatment, J. Chron. Dis. 
30:277-305, 1977. 


. Feinstein, A. R.: Clinical biostatistics. XL. 


Stochastic significance, consistency, apposite 
data, and some other remedies for the in- 
tellectual pollutants of statistical vocabulary, 
CLIN. PHARMACOL. THER. 22:113-123, 1977. 


. Frieden, J., Shapiro, J. H., and Feinstein, A. 


R.: Radiologic evaluation of heart size in 
rheumatic heart disease. Studies in young pa- 
tients, Arch. Intern. Med. 111:44-50, 1963. 
Gifford, R. H., and Feinstein, A. R.: A critique 
of methodology in studies of anticoagulant ther- 
apy for acute myocardial infarction, N. Engl. J. 
Med. 280:351-357, 1969. 


20. 


21. 


22; 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


Clinical biostatistics 497 


Gilles, F. H., Winston, K., Fulchiero, A., and 
Leviton, A.: Histologic features and observa- 
tional variation in cerebellar gliomas in children, 
J. Natl. Cancer Inst. 58:175-181, 1977. 
Herman, P. G., Gerson, D. E., Hessel, S. J., 
Mayer, B. S., Watnick, M., Blesser, B., and 
Ozonoff, D.: Disagreements in chest roentgen 
interpretation, Chest 68:278-282, 1975. 
Holmquist, N. D., McMahan, C. A., and 
Williams, O. D.: Variability in classification of 
carcinoma in situ of the uterine cervix, Arch. 
Pathol. 84:334-345, 1967. 

Kaplan, M. H., and Feinstein, A. R.: The im- 
portance of classifying initial co-morbidity in 
evaluating the outcome of diabetes mellitus, J. 
Chron. Dis. 27:387-404, 1974. 

Kendall, M. G., and Buckland, W. R.: A dic- 
tionary of statistical terms, ed. 3, Edinburgh, 
1971, Oliver & Boyd, Ltd. 

Kopf, A. W., Mintzis, M., and Bart, R. S.: 
Diagnostic accuracy in malignant melanoma, 
Arch. Dermatol. 111:1291-1292, 1975. 
Miller, M., Knatterud, G. L., Hawkins, B. S., 
and Newberry, W. B., Jr.: A study of the effects 
of oral hypoglycemic agents on vascular com- 
plications in patients with adult-onset diabetes. 
VI. Supplementary report on nonfatal events in 
patients treated with tolbutamide, Diabetes 
25:1129-1153, 1976. 

Peto, R., Pike, M. C., Armitage, P., Breslow, 
N.E., et al.: Design and analysis of randomized 
clinical trials requiring prolonged observation of 
each patient, II. Analysis and examples, Br. J. 
Cancer 35:1-39, 1977. 

Sackett, D. L., and Haynes, R. B., editors: 
Compliance with therapeutic regimens, Balti- 
more, Md., 1976, John Hopkins University 
Press. 

Sackett, D. L., and Holland, W. W.: Con- 
troversy in the detection of disease, Lancet 
2:357-359, 1975. 

Saksela, E., and Rintala, A.: Misdiagnosis of 
prepubertal malignant melanoma, Cancer 
22:1308-1314, 1968. 

Schneiderman, M. A.: Doctors ain’t so dumb, 
Amer. Sci. 49:250A, 1961. 

Siegler, E. E.: Microdiagnosis of carcinoma in 
situ of the uterine cervix: A comparative study 
of pathologists’ diagnoses, Cancer 9:463-469, 
1956. 

Sissons, H. A.: Agreement and disagreement be- 
tween pathologists in histologic diagnosis, 
Postgraduate Med. J. 51:685-689, 1975. 
Smith, M. J.: Error and variation in diagnostic 
radiology, Springfield, Ill., 1967, Charles C. 
Thomas, Publisher. 

Traux, H., Barnett, R. N., Hukill, P. B., 
Campbell, P. C., and Eisenberg, H.: Effect of 
inaccurate pathological diagnosis on survival 
statistics for melanoma: Survey of cases in the 


SUIT SUOMI IANVAIJ dIquoy{dde IJ} Áq poUIOAos a SAPNE YO ‘sn Jo SAMI 107 Kear] IUUQ £2]1A4 UO (sUONIPUOD-pUL-sULIOY/WIOD:Ko[IM’ ArequoUtUo//:sdny) SUONIPUOJ PUL SULAJ IP 29g “[¢ZOZ/CO/OZ] UO ÁTEIQIT IUUQ KOTIAA * OXI IUVIYION - VOUISPLA ZUQUY VSOY PUY Áq SEFFZTLLGIId2/ZOOI-OI/MOP/UOS Kopm Kreg ouuo dose//:SdNy Woy PIpPLoumOd “p *LL6I “SESOTESI 


498 Feinstein 


36. 


37. 


38. 


39. 


Connecticut Tumor Registry, Cancer 19:1543- 
1547, 1966. 

Tukey, J. W.: The future of data analysis, Ann. 
Math. Statist. 33:1-67, 1962. 

UGDP: The relation of treatment of diabetes 
mellitus to the development of vascular disease. 
A six year progress report, submitted to the Na- 
tional Institute of Arthritis and Metabolic Dis- 
eases, July 30, 1966. 

University Group Diabetes Program: A study of 
the effects of hypoglycemic agents on vascular 
complications in patients with adult-onset dia- 
betes. Part I. Design, methods, and baseline 
characteristics,; Part II. Mortality results, Dia- 
betes 19:(Suppl. 2):747-830, 1970. 

Weitzman, S., Pocock, W. A., Hawkins, D. 


40. 


4l. 


42. 


Clinical Pharmacology 
and Therapeutics 


M., and Barlow, J. B.: Observer variation in 
radiological assessment of pulmonary vascula- 
ture, Br. Heart J. 36:280-290, 1974. 
Yerushalmy, J.: Statistical problems in assessing 
methods of medical diagnosis, with special ref- 
erence to X-ray techniques, Pub. Health. Rep. 
62:1432-1449, 1947. 

Yerushalmy, J.: On inferring causality from ob- 
served associations, in Ingelfinger, F. J., Rel- 
man, A. S., and Finland, M., editors: Controv- 
ery in internal medicine, Philadelphia, 1966, W. 
B. Saunders Co. 

Zir, L. M., Miller, S. W., Dinsmore, R. E., 
Gilbert, J. P., and Harthorne, J. W.: Interob- 
server variability in coronary angiography, Cir- 
culation 53:627, 1976. 


SUIT SUOMIUOD aAnRAID dIqvor{dde IJ Áq poUIOAog AT SAPNE YO ‘sn JO SAMI 107 ÁWIQIT IUUQ ÁƏJIM UO (sUORIPUOD-pUL-sULIOY/WIOD ÁJ LW ArequoUTUo//:sdny) SUONIPUOD PUR SULAJ, IP 29g “[¢ZOZ/CO/OZ] UO Kreg] UUO KOTIAA * OXI IUVIYION - VOUISLLA ZUQUY VSOY PUY Áq SSHHTZLL6NAI/ZOOT O1/10P/W00 Kopm Kreg ouu o dose//:Sdny Woy PopeopUMod “p *LL6I “SESOTESI 


